NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback

Xiong, Guojun; Dinesha, Ujwal; Mukherjee, Debajoy; Li, Jian; Shakkottai, Srinivas (April 2025, The Thirteenth International Conference on Learning Representations (ICLR 2025))

Restless multi-armed bandits (RMAB) has been widely used to model constrained sequential decision making problems, where the state of each restless arm evolves according to a Markov chain and each state transition generates a scalar reward. However, the success of RMAB crucially relies on the availability and quality of reward signals. Unfortunately, specifying an exact reward function in practice can be challenging and even infeasible. In this paper, we introduce Pref-RMAB, a new RMAB model in the presence of preference signals, where the decision maker only observes pairwise preference feedback rather than scalar reward from the activated arms at each decision epoch. Preference feedback, however, arguably contains less information than the scalar reward, which makes Pref-RMAB seemingly more difficult. To address this challenge, we present a direct online preference learning (DOPL) algorithm for Pref-RMAB to efficiently explore the unknown environments, adaptively collect preference data in an online manner, and directly leverage the preference feedback for decision-makings. We prove that DOPL yields a sublinear regret. To our best knowledge, this is the first algorithm to ensure $$\tilde{\mathcal{O}}(\sqrt{T\ln T})$$ regret for RMAB with preference feedback. Experimental results further demonstrate the effectiveness of DOPL.
more » « less
Full Text Available
CONGO: COMPRESSIVE ONLINE GRADIENT OPTIMIZATION

Carleton, Jeremy; Vijaykumar, Prathik; Saxena, Divyanshu; Narasimha, Dheeraj; Shakkottai, Srinivas; Akella, Aditya (April 2025, ICLR, International Conference on Learning Representations 2025)

Full Text Available
Transformers are Provably Optimal In-context Estimators for Wireless Communications

Kunde, Vishnu_Teja; Valmeekam, Chandra_Shekhara_Kaushik; Narayanan, Krishna; Chamberland, Jean-Francois; Kalathil, Dileep; Shakkottai, Srinivas (May 2025, JMLR (Journal of Machine Learning Research), Cambridge MA)

Full Text Available
Meta-Learning for Fast Adaption in Caching Networks

https://doi.org/10.1109/TNET.2024.3478853

Narasimha, Dheeraj; Kalathil, Dileep; Shakkottai, Srinivas (January 2025, IEEE Transactions on Networking)

Full Text Available
Transformers are Provably Optimal In-context Estimators for Wireless Communications

Kunde, Vishnu T; Rajagopalan, Vicram; Valmeekam, Chandra SK; Narayanan, Krishna; Chamberland, Jean-Francois; Kalathil, Dileep; Shakkottai, Srinivas (May 2025, PMLR, Artificial Intelligence and Statistics (AISTATS) 2025)

Full Text Available
CONGO: Compressive Online Gradient Optimization

Carleton, Jeremy; Vijaykumar, Prathik; Saxena, Divyanshu; Narasimha, Dheeraj; Shakkottai, Srinivas; Akella, Aditya (January 2025, ICLR 2025)

We address the challenge of zeroth-order online convex optimization where the objective function's gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function's gradient even when the only information available is a limited number of function samples. Our motivation stems from the optimization of large-scale queueing networks that process time-sensitive jobs. Here, a job must be processed by potentially many queues in sequence to produce an output, and the service time at any queue is a function of the resources allocated to that queue. Since resources are costly, the end-to-end latency for jobs must be balanced with the overall cost of the resources used. While the number of queues is substantial, the latency function primarily reacts to resource changes in only a few, rendering the gradient sparse. We tackle this problem by introducing the Compressive Online Gradient Optimization framework which allows compressive sensing methods previously applied to stochastic optimization to achieve regret bounds with an optimal dependence on the time horizon without the full problem dimension appearing in the bound. For specific algorithms, we reduce the samples required per gradient estimate to scale with the gradient's sparsity factor rather than its full dimensionality. Numerical simulations and real-world microservices benchmarks demonstrate CONGO's superiority over gradient descent approaches that do not account for sparsity.
more » « less
Full Text Available
Helix: A RAN Slicing Based Scheduling Framework for Massive MIMO Networks

https://doi.org/10.1145/3696399

An, Qing; Pandey, Divyanshu; Doost-Mohammady, Rahman; Sabharwal, Ashutosh; Shakkottai, Srinivas (December 2024, Proceedings of the ACM on Networking)

An important aspect of 5G networks is the development of Radio Access Network (RAN) slicing, a concept wherein the virtualized infrastructure of wireless networks is subdivided into slices (or enterprises), tailored to fulfill specific use-cases. A key focus in this context is the efficient radio resource allocation to meet various enterprises' service-level agreements (SLAs). In this work, we introduce Helix: a channel-aware and SLA-aware RAN slicing framework for massive multiple input multiple output (MIMO) networks where resource allocation extends to incorporate the spatial dimension available through beamforming. Essentially, the same time-frequency resource block (RB) can be shared across multiple users through multiple antennas. Notably, certain enterprises, particularly those operating critical infrastructure, necessitate dedicated RB allocation, denoted as private networks, to ensure security. Conversely, some enterprises would allow resource sharing with others in the public network to maintain network performance while minimizing capital expenditure. Building upon this understanding, Helix comprises scheduling schemes under both scenarios: where different slices share the same set of RBs, and where they require exclusivity of allocated RBs. We validate the efficacy of our proposed schedulers through simulation by utilizing a channel data set collected from a real-world massive MIMO testbed. Our assessments demonstrate that resource sharing across slices using our approach can lead up to 60.9% reduction in RB usage compared to other approaches. Moreover, our proposed schedulers exhibit significantly enhanced operational efficiency, with significantly faster running time compared to exhaustive greedy approaches while meeting the stringent 5G sub-millisecond-level latency requirement.
more » « less
Full Text Available
Structured Reinforcement Learning for Media Streaming at the Wireless Edge

https://doi.org/10.1145/3641512.3686386

Bura, Archana; Bobbili, Sarat Chandra; Rameshkumar, Shreyas; Rengarajan, Desik; Kalathil, Dileep; Shakkottai, Srinivas (October 2024, ACM)

Full Text Available
SPARC: Spatio-Temporal Adaptive Resource Control for Multi-site Spectrum Management in NextG Cellular Networks

https://doi.org/10.1145/3696405

Ghosh, Ushasi; Chiejina, Azuka; Stephenson, Nathan; Shah, Vijay K; Shakkottai, Srinivas; Bharadia, Dinesh (December 2024, Proceedings of the ACM on Networking)

This work presents SPARC (Spatio-Temporal Adaptive Resource Control), a novel approach for multi-site spectrum management in NextG cellular networks. SPARC addresses the challenge of limited licensed spectrum in dynamic environments. We leverage the O-RAN architecture to develop a multi-timescale RAN Intelligent Controller (RIC) framework, featuring an xApp for near-real-time interference detection and localization, and a xApp for real-time intelligent resource allocation. By utilizing base stations as spectrum sensors, SPARC enables efficient and fine-grained dynamic resource allocation across multiple sites, enhancing signal-to-noise ratio (SNR) by up to 7dB, spectral efficiency by up to 15%, and overall system throughput by up to 20%. Comprehensive evaluations, including emulations and over-the-air experiments, demonstrate the significant performance gains achieved through SPARC, showcasing it as a promising solution for optimizing resource efficiency and network performance in NextG cellular networks.
more » « less
Full Text Available
A Multi-Agent View of Wireless Video Streaming with Delayed Client-Feedback

https://doi.org/10.1109/INFOCOM52122.2024.10621216

Khan, Nouman; Dinesha, Ujwal; Arunachalam, Subrahmanyam; Narasimha, Dheeraj; Subramanian, Vijay; Shakkottai, Srinivas (May 2024, IEEE)

Full Text Available

« Prev Next »

Search for: All records